Statistical Morphological Analyzer for Hindi
نویسندگان
چکیده
Morphology is the study of internal structure of words and is an essential early step in many NLP applications such as parsing and machine translation. Researchers working in Hindi NLP have either used the widely popular paradigm based analyzer (PBA) or extensions of it. In this work, we undertook a comprehensive evaluation of PBA using the data from the Hindi Treebank (HTB) and presented a new morphological analyzer trained on the HTB. Our morphological analyzer has better coverage and accuracy when compared to the existing analyzers for Hindi. An oracle system that takes the best values from the PBA’s output achieves only 63.41% for lemma, gender, number, person and case. Our statistical analyzer has an accuracy of 84.16% for these morphological attributes when evaluated on the test section of the Hindi Treebank.
منابع مشابه
Context Based Statistical Morphological Analyzer and its Effect on Hindi Dependency Parsing
This paper revisits the work of (Malladi and Mannem, 2013) which focused on building a Statistical Morphological Analyzer (SMA) for Hindi and compares the performance of SMA with other existing statistical analyzer, Morfette. We shall evaluate SMA in various experiment scenarios and look at how it performs for unseen words. The later part of the paper presents the effect of the predicted morph ...
متن کاملHindi Derivational Morphological Analyzer
Hindi is an Indian language which is relatively rich in morphology. A few morphological analyzers of this language have been developed. However, they give only inflectional analysis of the language. In this paper, we present our Hindi derivational morphological analyzer. Our algorithm upgrades an existing inflectional analyzer to a derivational analyzer and primarily achieves two goals. First, ...
متن کاملMorphological Analyser for Hindi – A Rule Based Implementation
Morphological analysis is an important part of Natural Language Processing. With this, the task of Machine translation becomes very easy. Morphological analyzer can be implemented effectively for the language which is rich in morphemes. Hindi is morphologically rich language. In this paper we focus on the design of a morphological analyzer for Hindi language. The analyzer takes a Hindi sentence...
متن کاملPart-of-Speech Tagging and Chunking with Maximum Entropy Model
This paper describes our work on Part-ofspeech tagging (POS) and chunking for Indian Languages, for the SPSAL shared task contest. We use a Maximum Entropy (ME) based statistical model. The tagger makes use of morphological and contextual information of words. Since only a small labeled training set is provided (approximately 21,000 words for all three languages), a ME based approach does not y...
متن کاملBengali and Hindi to English Cross-language Text Retrieval under Limited Resources
This paper describes our experiment on two cross-lingual and one monolingual English text retrievals at CLEF in the ad-hoc track. The cross-language task includes the retrieval of English documents in response to queries in two most widely spoken Indian languages, Hindi and Bengali. For our experiment, we had access to a HindiEnglish bilingual lexicon, ’Shabdanjali’, consisting of approx. 26K H...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013